On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching
نویسندگان
چکیده
We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O ( m p + lg p+ lg lg p · lg lgn ) time for p ≤ m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O ( mσ p max (k, lg lgn)+(1 + m p ) lg p · lg lgn+ occ ) time, where σ is the size of the alphabet and p ≤ σm. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P ′, we present a data structure for computing the interval of PP ′ in O(lg lgn) sequential time, or in O ( 1 + lgp lgn ) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array). 1998 ACM Subject Classification I.1.2 Algorithms
منابع مشابه
Efficient de novo assembly of large genomes using compressed data structures - Supplemental Materials and Methods
The suffix array is a compact representation of the lexicographic ordering of the suffixes of a text [1]. Each element of the array is an index into the original string; SAX [i] = j indicates that the suffix starting at position j in T is the i-th lowest suffix in X. As an example consider the string T = AGATCGATA$. The suffix array of T is SAT = [10, 9, 1, 7, 3, 5, 6, 2, 8, 4]. As the suffix a...
متن کاملGapped Suffix Arrays: a New Index Structure for Fast Approximate Matching
Approximate searching using an index is an important application in many fields. In this paper we introduce a new data structure called the gapped suffix array for approximate searching in the Hamming distance model. Building on the well known filtration approach for approximate searching, the use of the gapped suffix array can improve search speed by avoiding the merging of position lists.
متن کاملUltra-fast Multiple Genome Sequence Matching Using GPU
In this paper, a contrastive evaluation of massively parallel implementations of suffix tree and suffix array to accelerate genome sequence matching are proposed based on Intel Core i7 3770K quad-core and NVIDIA GeForce GTX680 GPU. Besides suffix array only held approximately 20%∼30% of the space relative to suffix tree, the coalesced binary search and tile optimization make suffix array clearl...
متن کاملTrends in Su x Sorting: A Survey of Low Memory Algorithms
The suffix array is a sorted array of all the suffixes in a string. This remarkably simple data structure is fundamental for string processing and lies at the heart of efficient algorithms for pattern matching, pattern mining, and data compression. In many applications suffix array construction, or equivalently suffix sorting, is a computational bottleneck and so has been the focus of intense r...
متن کاملSuffix Arrays for Structural Strings
The structural match (s-match), originally addressed by the structural suffix tree, helps identify different RNA sequences with the same secondary structure. In this work, we introduce and construct the structural suffix array and structural longest common prefix array, i.e. lightweight suffix data structures for the s-match. Further, we illustrate how to use our data structures to support addi...
متن کامل